{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4d1b897a",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/ollama.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2e33dced-e587-4397-81b3-d6606aa1738a",
   "metadata": {},
   "source": [
    "# Ollama + `gpt-oss` Cookbook\n",
    "\n",
    "OpenAI's latest open-source models, `gpt-oss`, [have been released](https://openai.com/open-models/).\n",
    "\n",
    "They come in two sizes:\n",
    "- 20 billion parameter model\n",
    "- 120 billion parameter model\n",
    "\n",
    "These models are Apache 2.0 licensed, and can be run locally on your machine. In this cookbook, we will use Ollama to demonstrate capabilities and test some claims of agentic and chain-of-thought behavior."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5863dde9-84a0-4c33-ad52-cc767442f63f",
   "metadata": {},
   "source": [
    "## Setup\n",
    "First, follow the [readme](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance.\n",
    "\n",
    "When the Ollama app is running on your local machine:\n",
    "- All of your local models are automatically served on localhost:11434\n",
    "- Select your model when setting llm = Ollama(..., model=\"<model family>:<version>\")\n",
    "- Increase defaullt timeout (30 seconds) if needed setting Ollama(..., request_timeout=300.0)\n",
    "- If you set llm = Ollama(..., model=\"<model family\") without a version it will simply look for latest\n",
    "- By default, the maximum context window for your model is used. You can manually set the `context_window` to limit memory usage."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "833bdb2b",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4816bcb9",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-ollama"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d083b598",
   "metadata": {},
   "source": [
    "## Chain-of-thought / Thinking with `gpt-oss`\n",
    "\n",
    "Ollama supports configuration for thinking when using `gpt-oss` models. Let's test this out with a few examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad297f19-998f-4485-aa2f-d67020058b7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.ollama import Ollama\n",
    "\n",
    "llm = Ollama(\n",
    "    model=\"gpt-oss:20b\",\n",
    "    request_timeout=360,\n",
    "    thinking=True,\n",
    "    temperature=1.0,\n",
    "    # Supports up to 130K tokens, lowering to save memory\n",
    "    context_window=8000,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ffdcb58",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "====== THINKING ======\n",
      "We need to multiply 1234 by 5678. Let's compute: 1234 * 5678. Use long multiplication or mental: 1234 * 5678 = ?\n",
      "\n",
      "Compute 5678 * 1234. 5678 * 1000 = 5,678,000. 5678 * 200 = 1,135,600. 5678 * 30 = 170,340. 5678 * 4 = 22,712. Sum: 5,678,000 + 1,135,600 = 6,813,600. +170,340 = 6,983,940. +22,712 = 7,006,652. Let's verify: Another way: 1234*5678 = (1200+34)*(5678) = 1200*5678 + 34*5678. 1200*5678= 5678*12*100 = 68,136*100? Wait 5678*12 = 5678*10 + 5678*2 = 56,780 + 11,356 = 68,136. times 100 = 6,813,600. 34*5678 = 5678*30 + 5678*4 = 170,340 + 22,712 = 193,052. Sum 6,813,600 + 193,052 = 7,006,652. Yes.\n",
      "\n",
      "Thus answer is 7,006,652.\n",
      "====== ANSWER ======\n",
      "\\(1234 \\times 5678 = 7{,}006{,}652\\)."
     ]
    }
   ],
   "source": [
    "resp_gen = await llm.astream_complete(\"What is 1234 * 5678?\")\n",
    "\n",
    "still_thinking = True\n",
    "print(\"====== THINKING ======\")\n",
    "async for chunk in resp_gen:\n",
    "    if still_thinking and chunk.additional_kwargs.get(\"thinking_delta\"):\n",
    "        print(chunk.additional_kwargs[\"thinking_delta\"], end=\"\", flush=True)\n",
    "    elif still_thinking:\n",
    "        still_thinking = False\n",
    "        print(\"\\n====== ANSWER ======\")\n",
    "\n",
    "    if not still_thinking:\n",
    "        print(chunk.delta, end=\"\", flush=True)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3ba9503c-b440-43c6-a50c-676c79993813",
   "metadata": {},
   "source": [
    "## Creating agents with `gpt-oss`\n",
    "\n",
    "While giving a response from a prompt is fine, we can also incorporate tools to get more precise results, and build an agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0739b1a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.agent.workflow import FunctionAgent\n",
    "from llama_index.llms.ollama import Ollama\n",
    "\n",
    "\n",
    "def multiply(a: int, b: int) -> int:\n",
    "    \"\"\"Multiply two numbers\"\"\"\n",
    "    return a * b\n",
    "\n",
    "\n",
    "llm = Ollama(\n",
    "    model=\"gpt-oss:20b\",\n",
    "    request_timeout=360,\n",
    "    thinking=False,\n",
    "    temperature=1.0,\n",
    "    # Supports up to 130K tokens, lowering to save memory\n",
    "    context_window=8000,\n",
    ")\n",
    "\n",
    "agent = FunctionAgent(\n",
    "    tools=[multiply],\n",
    "    llm=llm,\n",
    "    system_prompt=\"You are a helpful assistant that can multiply and add numbers. Always rely on tools for math operations.\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3520122",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Tool call: multiply({'a': 1234, 'b': 5678}\n",
      "\n",
      "Tool call: multiply({'a': 1234, 'b': 5678}) -> 7006652\n",
      "The product is **7,006,652**."
     ]
    }
   ],
   "source": [
    "from llama_index.core.agent.workflow import (\n",
    "    ToolCall,\n",
    "    ToolCallResult,\n",
    "    AgentStream,\n",
    ")\n",
    "\n",
    "handler = agent.run(\"What is 1234 * 5678?\")\n",
    "async for ev in handler.stream_events():\n",
    "    if isinstance(ev, ToolCall):\n",
    "        print(f\"\\nTool call: {ev.tool_name}({ev.tool_kwargs}\")\n",
    "    elif isinstance(ev, ToolCallResult):\n",
    "        print(\n",
    "            f\"\\nTool call: {ev.tool_name}({ev.tool_kwargs}) -> {ev.tool_output}\"\n",
    "        )\n",
    "    elif isinstance(ev, AgentStream):\n",
    "        print(ev.delta, end=\"\", flush=True)\n",
    "\n",
    "resp = await handler"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4556b038",
   "metadata": {},
   "source": [
    "### Remembering past events with Agents\n",
    "\n",
    "By default, agent runs do not remember past events. However, using the `Context`, we can maintain state between calls. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de5aa0f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.workflow import Context\n",
    "\n",
    "ctx = Context(agent)\n",
    "\n",
    "resp = await agent.run(\"What is 1234 * 5678?\", ctx=ctx)\n",
    "resp = await agent.run(\"What was the last question/answer pair?\", ctx=ctx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "075da010",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Last question:**  \n",
      "*“What is 1234 * 5678?”*  \n",
      "\n",
      "**Answer:**  \n",
      "*The product of 1234 and 5678 is 7,006,652.*\n"
     ]
    }
   ],
   "source": [
    "print(resp.response.content)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
